Add Nemotron-Colembed-v2 results for Vidore V1-V3#408
Add Nemotron-Colembed-v2 results for Vidore V1-V3#408Samoed merged 3 commits intoembeddings-benchmark:mainfrom
Conversation
Model Results ComparisonReference models: Results for
|
| task_name | nvidia/llama-nemotron-colembed-vl-3b-v2 | Max result | Model with max result | In Training Data |
|---|---|---|---|---|
| Vidore2BioMedicalLecturesRetrieval | 0.6319 | 0.6547 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore2ESGReportsHLRetrieval | 0.7311 | 0.7698 | ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-7B-v1 | False |
| Vidore2ESGReportsRetrieval | 0.5864 | 0.6244 | TomoroAI/tomoro-colqwen3-embed-4b | False |
| Vidore2EconomicsReportsRetrieval | 0.5859 | 0.6219 | ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 | False |
| Vidore3ComputerScienceRetrieval | 0.7709 | 0.7752 | VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 | False |
| Vidore3EnergyRetrieval | 0.6488 | 0.6841 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3FinanceEnRetrieval | 0.6423 | 0.6508 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3FinanceFrRetrieval | 0.4441 | 0.4910 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3HrRetrieval | 0.6228 | 0.6398 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3IndustrialRetrieval | 0.5171 | 0.5441 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3PharmaceuticalsRetrieval | 0.6604 | 0.6636 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3PhysicsRetrieval | 0.4693 | 0.5013 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| VidoreArxivQARetrieval | 0.9040 | 0.9380 | VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 | True |
| VidoreDocVQARetrieval | 0.6717 | 0.6696 | VAGOsolutions/SauerkrautLM-ColQwen3-4b-v0.1 | True |
| VidoreInfoVQARetrieval | 0.9468 | 0.9492 | nvidia/llama-nemoretriever-colembed-3b-v1 | True |
| VidoreShiftProjectRetrieval | 0.9200 | 0.9293 | jinaai/jina-embeddings-v4 | False |
| VidoreSyntheticDocQAAIRetrieval | 1.0000 | 1.0000 | ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 | True |
| VidoreSyntheticDocQAEnergyRetrieval | 0.9802 | 0.9763 | ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 | True |
| VidoreSyntheticDocQAGovernmentReportsRetrieval | 0.9795 | 0.9889 | ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 | True |
| VidoreSyntheticDocQAHealthcareIndustryRetrieval | 0.9889 | 1.0000 | VAGOsolutions/SauerkrautLM-ColQwen3-4b-v0.1 | True |
| VidoreTabfquadRetrieval | 0.9725 | 0.9596 | nomic-ai/colnomic-embed-multimodal-7b | False |
| VidoreTatdqaRetrieval | 0.8104 | 0.8404 | VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 | True |
| Average | 0.7493 | 0.7669 | nan | - |
Model have high performance on these tasks: VidoreSyntheticDocQAEnergyRetrieval,VidoreTabfquadRetrieval,VidoreDocVQARetrieval
Training datasets: HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, JinaVDRArxivQARetrieval, JinaVDRDocQAAI, JinaVDRDocQAEnergyRetrieval, JinaVDRDocQAGovReportRetrieval, JinaVDRDocQAHealthcareIndustryRetrieval, JinaVDRDocVQARetrieval, JinaVDRInfovqaRetrieval, JinaVDRTatQARetrieval, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoNQ-VN, NanoNQRetrieval, SQuAD, StackExchangeClustering, StackExchangeClustering-VN, StackExchangeClustering.v2, VDRMultilingualRetrieval, VidoreArxivQARetrieval, VidoreDocVQARetrieval, VidoreInfoVQARetrieval, VidoreSyntheticDocQAAIRetrieval, VidoreSyntheticDocQAEnergyRetrieval, VidoreSyntheticDocQAGovernmentReportsRetrieval, VidoreSyntheticDocQAHealthcareIndustryRetrieval, VidoreTatdqaRetrieval, VisRAG-Ret-Train-In-domain-data, VisRAG-Ret-Train-Synthetic-data, WebInstructSub, docmatix-ir, wiki-ss-nq
Results for nvidia/nemotron-colembed-vl-4b-v2
| task_name | nvidia/nemotron-colembed-vl-4b-v2 | Max result | Model with max result | In Training Data |
|---|---|---|---|---|
| Vidore2BioMedicalLecturesRetrieval | 0.6432 | 0.6547 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore2ESGReportsHLRetrieval | 0.7143 | 0.7698 | ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-7B-v1 | False |
| Vidore2ESGReportsRetrieval | 0.6148 | 0.6244 | TomoroAI/tomoro-colqwen3-embed-4b | False |
| Vidore2EconomicsReportsRetrieval | 0.6075 | 0.6219 | ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 | False |
| Vidore3ComputerScienceRetrieval | 0.7856 | 0.7752 | VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 | False |
| Vidore3EnergyRetrieval | 0.6747 | 0.6841 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3FinanceEnRetrieval | 0.6502 | 0.6508 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3FinanceFrRetrieval | 0.4901 | 0.4910 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3HrRetrieval | 0.6239 | 0.6398 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3IndustrialRetrieval | 0.5391 | 0.5441 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3PharmaceuticalsRetrieval | 0.6610 | 0.6636 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3PhysicsRetrieval | 0.4886 | 0.5013 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| VidoreArxivQARetrieval | 0.9203 | 0.9380 | VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 | True |
| VidoreDocVQARetrieval | 0.6739 | 0.6696 | VAGOsolutions/SauerkrautLM-ColQwen3-4b-v0.1 | True |
| VidoreInfoVQARetrieval | 0.9331 | 0.9492 | nvidia/llama-nemoretriever-colembed-3b-v1 | True |
| VidoreShiftProjectRetrieval | 0.9226 | 0.9293 | jinaai/jina-embeddings-v4 | False |
| VidoreSyntheticDocQAAIRetrieval | 0.9926 | 1.0000 | ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 | True |
| VidoreSyntheticDocQAEnergyRetrieval | 0.9619 | 0.9763 | ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 | True |
| VidoreSyntheticDocQAGovernmentReportsRetrieval | 0.9802 | 0.9889 | ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 | True |
| VidoreSyntheticDocQAHealthcareIndustryRetrieval | 0.9852 | 1.0000 | VAGOsolutions/SauerkrautLM-ColQwen3-4b-v0.1 | True |
| VidoreTabfquadRetrieval | 0.9805 | 0.9596 | nomic-ai/colnomic-embed-multimodal-7b | False |
| VidoreTatdqaRetrieval | 0.8119 | 0.8404 | VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 | True |
| Average | 0.7571 | 0.7669 | nan | - |
Model have high performance on these tasks: VidoreTabfquadRetrieval,Vidore3ComputerScienceRetrieval,VidoreDocVQARetrieval
Training datasets: JinaVDRArxivQARetrieval, JinaVDRDocQAAI, JinaVDRDocQAEnergyRetrieval, JinaVDRDocQAGovReportRetrieval, JinaVDRDocQAHealthcareIndustryRetrieval, JinaVDRDocVQARetrieval, JinaVDRInfovqaRetrieval, JinaVDRTatQARetrieval, VDRMultilingualRetrieval, VidoreArxivQARetrieval, VidoreDocVQARetrieval, VidoreInfoVQARetrieval, VidoreSyntheticDocQAAIRetrieval, VidoreSyntheticDocQAEnergyRetrieval, VidoreSyntheticDocQAGovernmentReportsRetrieval, VidoreSyntheticDocQAHealthcareIndustryRetrieval, VidoreTatdqaRetrieval, VisRAG-Ret-Train-In-domain-data, VisRAG-Ret-Train-Synthetic-data, docmatix-ir, wiki-ss-nq
Results for nvidia/nemotron-colembed-vl-8b-v2
| task_name | nvidia/nemotron-colembed-vl-8b-v2 | Max result | Model with max result | In Training Data |
|---|---|---|---|---|
| Vidore2BioMedicalLecturesRetrieval | 0.6616 | 0.6547 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore2ESGReportsHLRetrieval | 0.7315 | 0.7698 | ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-7B-v1 | False |
| Vidore2ESGReportsRetrieval | 0.6056 | 0.6244 | TomoroAI/tomoro-colqwen3-embed-4b | False |
| Vidore2EconomicsReportsRetrieval | 0.6076 | 0.6219 | ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 | False |
| Vidore3ComputerScienceRetrieval | 0.7929 | 0.7752 | VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 | False |
| Vidore3EnergyRetrieval | 0.6982 | 0.6841 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3FinanceEnRetrieval | 0.6729 | 0.6508 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3FinanceFrRetrieval | 0.5154 | 0.4910 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3HrRetrieval | 0.6632 | 0.6398 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3IndustrialRetrieval | 0.5603 | 0.5441 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3PharmaceuticalsRetrieval | 0.6719 | 0.6636 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| Vidore3PhysicsRetrieval | 0.5084 | 0.5013 | TomoroAI/tomoro-colqwen3-embed-8b | False |
| VidoreArxivQARetrieval | 0.9308 | 0.9380 | VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 | True |
| VidoreDocVQARetrieval | 0.6805 | 0.6696 | VAGOsolutions/SauerkrautLM-ColQwen3-4b-v0.1 | True |
| VidoreInfoVQARetrieval | 0.9456 | 0.9492 | nvidia/llama-nemoretriever-colembed-3b-v1 | True |
| VidoreShiftProjectRetrieval | 0.9330 | 0.9293 | jinaai/jina-embeddings-v4 | False |
| VidoreSyntheticDocQAAIRetrieval | 1.0000 | 1.0000 | ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 | True |
| VidoreSyntheticDocQAEnergyRetrieval | 0.9789 | 0.9763 | ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 | True |
| VidoreSyntheticDocQAGovernmentReportsRetrieval | 0.9889 | 0.9889 | ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-3B-v1 | True |
| VidoreSyntheticDocQAHealthcareIndustryRetrieval | 0.9963 | 1.0000 | VAGOsolutions/SauerkrautLM-ColQwen3-4b-v0.1 | True |
| VidoreTabfquadRetrieval | 0.9774 | 0.9596 | nomic-ai/colnomic-embed-multimodal-7b | False |
| VidoreTatdqaRetrieval | 0.8337 | 0.8404 | VAGOsolutions/SauerkrautLM-ColQwen3-8b-v0.1 | True |
| Average | 0.7707 | 0.7669 | nan | - |
Model have high performance on these tasks: VidoreSyntheticDocQAEnergyRetrieval,VidoreTabfquadRetrieval,VidoreShiftProjectRetrieval,Vidore3ComputerScienceRetrieval,Vidore3EnergyRetrieval,VidoreDocVQARetrieval,Vidore3PharmaceuticalsRetrieval,Vidore2BioMedicalLecturesRetrieval,Vidore3FinanceEnRetrieval,Vidore3HrRetrieval,Vidore3IndustrialRetrieval,Vidore3PhysicsRetrieval,Vidore3FinanceFrRetrieval
Training datasets: JinaVDRArxivQARetrieval, JinaVDRDocQAAI, JinaVDRDocQAEnergyRetrieval, JinaVDRDocQAGovReportRetrieval, JinaVDRDocQAHealthcareIndustryRetrieval, JinaVDRDocVQARetrieval, JinaVDRInfovqaRetrieval, JinaVDRTatQARetrieval, VDRMultilingualRetrieval, VidoreArxivQARetrieval, VidoreDocVQARetrieval, VidoreInfoVQARetrieval, VidoreSyntheticDocQAAIRetrieval, VidoreSyntheticDocQAEnergyRetrieval, VidoreSyntheticDocQAGovernmentReportsRetrieval, VidoreSyntheticDocQAHealthcareIndustryRetrieval, VidoreTatdqaRetrieval, VisRAG-Ret-Train-In-domain-data, VisRAG-Ret-Train-Synthetic-data, docmatix-ir, wiki-ss-nq
|
@KennethEnevoldsen hello. can you rerun the failing test? our Mteb MR was merged. thanks. |
|
I tried to rerun, but new version should be released to run test. I'll try to use main repo instead |
|
@Samoed @KennethEnevoldsen thanks for merging our MR. I'd like to follow up about LB update. We do still not see our three models' results on the ViDoRe LB on public tasks. Could you let us know if an update is expected soon? I also created this issue to request evaluation on the private tasks. fyi. thanks. |
|
Yeah, I find that our docker not building every time. I'll fix this. I saw your request to run private models. I'll run tomorrow |

Added Vidore v1-v3 benchmark results for three multimodal embedding models: llama-nemotron-colembed-vl-3b-v2, nemotron-colembed-vl-4b-v2 and nemotron-colembed-vl-8b-v2
Checklist
mteb/models/model_implementations/, this can be as an API. Instruction on how to add a model can be found here